Evolutionary history of an Alpine Archaeognath (Machilis pallida): Insights from different variant

Abstract Reconstruction of species histories is a central aspect of evolutionary biology. Patterns of genetic variation within and among populations can be leveraged to elucidate evolutionary processes and demographic histories. However, interpreting genetic signatures and unraveling the contributing processes can be challenging, in particular for non‐model organisms with complex reproductive modes and genome organization. One way forward is the combined consideration of patterns revealed by different molecular markers (nuclear vs. mitochondrial) and types of variants (common vs. rare) that differ in their age, mode, and rate of evolution. Here, we applied this approach to RNAseq data generated for Machilis pallida (Archaeognatha), an Alpine jumping bristletail considered parthenogenetic and triploid. We generated de novo transcriptome and mitochondrial assemblies to obtain high‐density data to investigate patterns of mitochondrial and common and rare nuclear variation in 17 M. pallida individuals sampled from all known populations. We find that the different variant types capture distinct aspects of the evolutionary history and discuss the observed patterns in the context of parthenogenesis, polyploidy, and survival during glaciation. This study highlights the potential of different variant types to gain insights into evolutionary scenarios even from challenging but often available data and the suitability of M. pallida and the genus Machilis as a study system for the evolution of sexual strategies and polyploidization during environmental change. We also emphasize the need for further research which will be stimulated and facilitated by these newly generated resources and insights.

For example, mitochondrial and chloroplast markers are subject to stronger drift due to their lower Ne and have therefore frequently been used to investigate recent colonization events and changes in the connectivity among populations (Harrison, 1989;Schönswetter et al., 2005). In addition, incongruence of phylogenetic signals found in nuclear and mtDNA/ptDNA can inform about gene flow events (Avise et al., 1987;Baldo et al., 2011;Funk & Omland, 2003).
When high-density markers are available, a different approach can be applied that makes use of the fact that patterns of common and rare variants are affected differently by population genetic and demographic processes. On the one hand, common variants are expected to be shared among many individuals and populations and therefore likely reflect broader geographic patterns and older events. On the other hand, many rare variants likely reflect recent mutations that have not yet spread or could be older variants in a migration-drift 'quasi-equilibrium' (Barton & Slatkin, 1986;Slatkin, 1985;Slatkin & Takahata, 1985;reviewed in Gompert et al., 2014). Regardless of the underlying processes, however, these variants are expected to be spatially restricted under low dispersal and gene flow (Barton & Slatkin, 1986;discussed in Gompert et al., 2014). In fact, several studies found that rare variants reflect more recent and geographically localized processes in humans (Li et al., 2010;Mathieson & McVean, 2012Nelson et al., 2012).
The aforementioned approaches utilizing rare and common nuclear variants as well as mitochondrial markers can be combined to leverage the differences in resolution and sensitivity to population genetics processes of these three types of variants. We use this highly versatile framework to gain a better understanding of the complex evolutionary history of an Alpine apterygote insect and extend it to reassess previously proposed scenarios of switches in reproductive mode and ploidy .
The jumping bristletail genus Machilis (Archaeognatha or Microcoryphia) comprises 94 described species and with 55 species, the European Alps harbor the highest species diversity (de Jong et al., 2014;Dejaco et al., 2016). Due to their ancestral winglessness and multiple shared plesiomorphies with other insect orders, they have often been described as 'ancestral' or 'primitive' (Sturm & Machida, 2001). Given the lack of wings and the high degree of endemism in the genus, these bristletails are assumed to be slow dispersers (Dejaco et al., 2016;Sturm & Machida, 2001), however, modes of dispersal as well as the potential for dispersal are still unknown.
Genomic features associated with these transitions can include changes in heterozygosity, less effective positive selection, or a low transposable element load. However, these features are not universally applicable and often lineage-specific Jaron et al., 2021Jaron et al., , 2022. Karyotyping and flow cytometry results from Machilis indicate various instances of polyploidy (Gassner et al., 2014). Interestingly, polyploid animals tend to occur more often at higher latitudes (Lorch et al., 2016) with the vast majority, 80.5% of polyploid species, being found outside the tropics (David, 2022). This trend is particularly pronounced in insects, as 97% of polyploid species are found outside tropical regions (David, 2022). Moreover, glaciation is a significant environmental factor that promotes polyploidy and is of particular importance for insects and amphibians (David, 2022). Polyploidization and asexual reproduction may have played a role in the successful colonization of areas previously covered by glaciers (Lorch et al., 2016).
Although polyploidization is indeed strongly correlated with parthenogenetic reproduction in animals, this is not a general rule (Otto & Whitton, 2000), and there is also no such strict association in Machilis. In the species studied so far, sexuals were found to be diploid whereas asexuals were classified as either diploid or triploid (Gassner et al., 2014).
The high number of Machilis endemics and their distribution in and around the European Alps has sparked interest in their distribution and survival during the ice ages. Identifying and characterizing the different refugia during the last glacial maximum (LGM; 18,000 years before present) (van Husen, 1997) enables a better understanding of current species distributions and levels of diversity, both in terms of species numbers and genetic variation (Holderegger & Thiel-Egenter, 2009;Knowles, 2000;Schneeweiss & Schönswetter, 2011;Schönswetter et al., 2002Schönswetter et al., , 2005Stehlik, 2003;Westergaard et al., 2011).
Such information also increases our knowledge on how species can react to climate change and which biological features are associated with successful survival in a changing environment.
One open debate in this context revolves around the question of whether and how frequently arctic and high alpine species survived glaciation on nunataks (isolated mountain top areas protruding above the ice sheet) versus in peripheral areas (Schneeweiss & Schönswetter, 2011;Schönswetter et al., 2005). While several studies in plants suggest nunatak survival (Abbott & Brochmann, 2003;Bettin et al., 2007;Parisod & Besnard, 2007;Stehlik et al., 2002;Westergaard et al., 2011), there is little evidence in animals, with the exception of Trechus ground beetles from peripheral nunataks in the Orobian Alps (Lohse et al., 2011).
One case in animals, for which central nunatak survival has been suggested  is Machilis pallida (Janetschek, 1949), an endemic bristletail in the Eastern Alps, residing exclusively on carbonate rock scree above 2000 m above sea level (a.s.l.) Wachter et al., 2012). Based on mitochondrial and AFLP data for three populations, Wachter et al. (2012) proposed that this species survived LGM on both peripheral and central nunataks, indicating that central refugia may be more important than previously thought, and they suggest a complex evolutionary history. Further, the species is considered to be parthenogenetic Wachter et al., 2012) and triploid (Gassner et al., 2014).
Their early divergence in the insect phylogenetic tree, the occurrence of different reproductive systems and ploidy levels, and their patterns of distribution make bristletails in the genus Machilis an interesting study object in ecology and evolution. Here, we provide novel genomic resources for this system by presenting the first de novo assembled transcriptome obtained from 17 triploid individuals of M. pallida from six populations in the European Alps as well as a de novo assembled mitochondrial genome. We then outline our approach to obtain information on the population structure and biology of this Alpine bristletail species from the generated RNAseq data. Similar RNAseq data sets are available for many non-model species and while their analysis arguably can pose some challenges, they are also an underappreciated source of information that can be exploited to gain first insights into the evolutionary history of a species and inform the design of targeted follow-up studies (Feng et al., 2023;Mossion et al., 2022;Thorstensen et al., 2021). Here, we utilize our new resources to combine mitochondrial variation as well as common and rare nuclear variation and assess the distribution of differences and similarities in genetic signatures revealed by those sets of markers to better understand the evolutionary history of this species. Lastly, we discuss our findings in the context of current hypotheses on the reproductive mode, ploidy, and distribution of M.
pallida.   Table A1). This sampling design includes putative central nunatak populations (L, K, P, O) and putative peripheral nunatak populations (M and G) (see Dejaco et al., 2016;Gassner et al., 2014;Wachter et al., 2012). All individuals were transported alive to the Department of Ecology at the University of Innsbruck, Austria, and kept in plastic boxes supplemented with humid gravel from the collection site at 10°C and a 12:12 h light:dark cycle for 4 days to normalize gene expression. The individuals were identified based on morphological characters using the key in Dejaco et al. (2012) and to ensure that only adults were included only large-bodied individuals with a fully developed ovipositor were collected.

| RNA extraction, library preparation, and sequencing
Individuals were shock-frozen in liquid nitrogen and we performed RNA extraction using the Macherey and Nagel Nucleospin RNA kit following the instructions of the manufacturer. Library construction and sequencing were performed by IGATech. We then constructed a barcoded library for all individuals with the Illumina TruSeq Stranded mRNA Library prep kit following the manufacturer's instructions.
Libraries were then sequenced on an Illumina HiSeq 2500 platform in 250 bp paired-end rapid run mode.

| Transcriptome assembly and mitochondrial reads
We performed quality control of the raw data with FastQC (Andrews, 2010) before concatenating raw reads from the 17 individuals. BBtools v37.36 (Bushnell, 2021) was used for (i) quality trimming and quality filtering (bbduk.sh with a minimum length of 10), (ii) decontamination (with bbduk.sh and reference phix174_ill.  using Trinity v2.2 (Grabherr et al., 2011) with the following settings: strand-specific RNA-Seq read orientation, minimum contig length of 500, Jaccard clip, and normalization of reads. After assembly, we performed cleaning steps to remove contaminants and redundancy in the transcriptome: (i) Blobtools v1.0 (Kumar et al., 2013) was used for contaminant filtering with two Blastn (Altschul et al., 1990) evalue cutoffs of 1e−3 and 1e−5 to identify contaminants. In this step, transcripts of viruses, archaea, bacteria, and fungi were removed from the dataset. (ii) CD-HIT-EST v2 (Fu et al., 2012) was used to find unigenes (options: alignment coverage of 0.9, word length of 8, and cluster to most similar cluster (g) of 1). When multiple transcripts had the same BLAST hit, only the longest was retained. After each cleaning step, a completeness check was performed using Universal Single-Copy Orthologs (BUSCO) software v3 (Simão et al., 2015).
With these predictions, the annotation with Trinotate v3.1.1 (Bryant et al., 2017) was performed by doing a homology search in BLAST (Altschul et al., 1990) and We further created a de novo mitochondrial assembly of M. pallida using MITObim (Hahn et al., 2013), which employs a mitochondrial baiting and iterative mapping approach using the MIRA assembler (Chevreux et al., 1999). We used a kmer length of 31 for bait fishing with mirabait using genome skimming data (individual 92,010, Murmeltierhuette, ERS4357532; SAMEA6593248; T. Dejaco, unpubl.) and the Songmachilis xinxiangensis mitochondrion as baiting sequence (He et al., 2013) (with MIRA v4.0.2 (Chevreux et al., 1999)) with 30 iterations. We used ORFfinder to search for insect mitochondrial ORFs in the mitochondrial contig, and from the resulting ORFs, we blasted the 10 longest hits using smartBLAST (NCBI, 2021). We also annotated the assembled mitochondrion using MITOS (Bernt et al., 2013).

| Population genetics, haplotype networks, and Neighbor-Joining trees
We extracted genotypes for mitochondrial variants, as well as common and rare variants using custom python scripts (mafFltr.py). We converted the genotypes of all three variant types into Nexus format (vcf2nex.py) to compute distance matrices and obtain Neighbor-Joining (NJ) trees for all three variant types in R using the ape package (Paradis & Schliep, 2019), and to obtain haplotype networks for rare and mitochondrial variants using the pegas package (Paradis, 2010).
We further calculated Hamming distances from the diploid (mt) and triploid (common and rare) variants for Multidimensional Scaling (hereafter referred to as Principal Coordinates Analysis, PCoA) in R (R Core Team, 2020). Additionally, we calculated pairwise distances among populations and individuals using Nei's genetic distance for both mitochondrial and nuclear variants in R using the StAMPP package (Pembleton et al., 2013).

| Transcriptome assembly and mitochondrial reads
The mean read number per individual after quality trimming, de-contamination, and normalization amounted to 8 million (M) reads (SD = 1.7 M) (see Table A1), resulting in a total of 359 M reads used for the assembly. The mean coverage and GC content per individual was 8.6 (SD = 1.8) and 41.1% (SD = 0.9%), respectively (

| Alignment and variant calling
Across

| Population genetics, haplotype networks, and Neighbor-Joining trees
In the mitochondrial haplotype network, we found 11 haplotypes  Principal coordinates of 11,616 rare nuclear variants explained about 26% of the overall variance across the 17 individuals. Figure 4 and Figure    Population genetic theory predicts that asexual populations should (1) have lower effective population sizes (Barton & Charlesworth, 1998), and (2) be less efficient in their adaptive potential, since beneficial mutations would be lost more easily in asexual compared with sexual populations (Muller, 1964). A beneficial mutation needs to confer a strong selective advantage to avoid loss through clonal interference and Muller's Ratchet (Charlesworth & Charlesworth, 1997;Felsenstein, 1974;Fisher, 1930;Hill & Robertson, 1966;Muller, 1932Muller, , 1964. However, transiently abundant beneficial mutations that do not go to fixation might be common in asexual populations, which might experience a leapfrog effect, where the common genotype is less closely related to the immediately preceding common genotype, but more closely related to earlier genotypes (Gerrish & Lenski, 1998). In other words, two individuals sampled from different asexual populations might appear to be more closely related to each other than to individuals from their own population if there is more than one asexual lineage present therein.
The combination of asexuality and polyploidy as suggested for M.
pallida is generally assumed to be relatively rare (Burch & Jung, 1993;Dufresne & Hebert, 1994;Otto & Whitton, 2000;White, 1973), and most obligately asexual lineages in plants and animals have evolved relatively recently (Maynard Smith, 1978). Moreover, it was found that polyploid populations more often tend to have multiple rather than single origins in plants (Soltis et al., 1993;Soltis & Soltis, 1999) and in animals (reviewed in Otto & Whitton, 2000, see also Chaplin & Hebert, 1997, Dufresne & Hebert, 1994. Such multiple origins of polyploidy are thought to arise via either a high rate of polyploidization during initial establishment of a given lineage and/or recurrent gene flow with related diploid taxa (Otto & Whitton, 2000).
Based on the assumptions stemming from the aforementioned theoretical work, we will now discuss the significance of the patterns cluster (Figure 3, Figure A2, Tables A5 and A6). While localities K and L are very close to each other and connected by a ridge, P is demarcated by a valley and O is farthest away and separated by a larger valley. Therefore, the P cluster may, at least partly, also mirror a geographic pattern. However, P is the locality with the highest number of samples, and we cannot exclude a sample size bias in our analyses.
Genetic distances among all sampled mitochondrial haplotypes are low and thus do not provide direct support for a hybridization event between two distinct lineages.
The common nuclear variants show a more heterogeneous picture but are broadly consistent with the mitochondrial results confirming the differentiation of the geographically separated central Alpine and southern localities as well as the P cluster. We note, however, that the differentiation between North and South is less clear and more gradual (Figure 4, Figure A3, Tables A5 and A6). Moreover, we do not find support for the existence of two main nuclear clusters suggested by an earlier admixture analysis on AFLP data  (discussed below).
The results from the rare nuclear variants are in stark contrast to the other two marker types. We do not find any obvious patterns mirroring populations or geographic distances with the sole exception of the P cluster in the PCoA, which may be driven by sample size (Figure 4, Figure A4).
For the interpretation of the observed patterns, we consider four simplistic scenarios that differ in whether parthenogenesis emerged before or after populations split and in the presence and absence of migration ( Figure 6). Assuming a single origin of parthenogenesis in an unstructured population followed by immediate dispersal of this lineage to the current locations (Figure 6a,c), we would not expect to see clustering by geography. Later migration between nearby populations ( Figure 6d) could, however, create a signature of geography and isolation by distance and, for example, explain the coherence of the central Alpine cluster. Machilis species are considered slow dispersers based on the fact that they lack wings and due to the high degree of endemism in the genus. However, their actual dispersal potential has not been quantified so far (Sturm & Machida, 2001, p. 61/62). A related scenario would involve a single origin of parthenogenesis followed by an extended time period allowing for range expansion and differentiation of the parthenogenetic lineage and subsequent fragmentation, yielding the current distribution pattern.
However, as mentioned before, there could be different scenarios, congruent with the observed data. For instance, the neighborjoining tree constructed from common nuclear variants (Figure 5c) proposes a closer relationship between the two southern populations consistent with the existence of central and peripheral refugia.
Additionally, the aforementioned leapfrog effect may also affect the observed relationships between populations, and thus, interpreting these patterns warrants caution.
Alternatively, the clusters visible in the mitochondrial and common nuclear data may reflect population structure already present in a more widespread sexual progenitor, whereas the lack of structure in the rare nuclear variants may be attributable to the presence of asexual lineages as well as limited resolution due to the small sample sizes for single populations. In this scenario, multiple transitions to parthenogenesis after the build-up of population structure need to be invoked (Figure 6b,e,f) but it would not be necessary to postulate migration to explain the geographic clusters (Figure 6e,f).
The fact that the mitochondrial variants show more distinct clusters compared with the common nuclear variants is consistent with the smaller effective population size of mitochondrial DNA and therefore a stronger effect of drift. Alternatively, common nuclear variants could exhibit the aforementioned leapfrog effect (Gerrish & Lenski, 1998). That is, if enough common variants show a pattern, where an individual appears more closely related to a conspecific from a different population, then this could lead to common nuclear variants exhibiting a different picture than rare nuclear or mitochondrial variants, where individuals from different populations form tight clusters. However, while common nuclear variants do in fact show such a pattern, we cannot make a definitive statement as to the underlying processes. While we focus on a few simple scenarios here, we note that arbitrarily complex scenarios including combinations of multiple, temporally separated origins of parthenogenesis, rare sexual reproduction, colonization and extinction cycles, and stepping stone models could also fit our data. Wachter et al. (2012) proposed central and peripheral nunatak survival based on the geographic distribution of mitochondrial COI haplotypes and two nuclear AFLP clusters across three sampling localities. Under the assumption of slow and limited dispersal of this apterygote insect species, our data could be explained by such a scenario. In contrast to these previous results, however, we find no evidence for the presence of two distinct nuclear clusters but instead reveal a more complex population structure. This may be due to differences in marker types and sample size or due to the k = 2 conundrum (Janes et al., 2017), the tendency of the deltaK method (Evanno et al., 2005) to frequently identify k = 2 as top hierarchical level in STRUCTURE analyses (Pritchard et al., 2000). Moreover, our strategy of leveraging the information contained in sets of different variants highlights possible, more complex variations of this simplistic scenario as outlined above despite the small sample size.
Therefore, applying this approach to a larger sampling scheme, both in terms of populations and individuals, combined with simulation studies holds great potential to fully elucidate the distribution and migration patterns during the ice ages, rigorously test competing hypotheses and date dispersal events as well as the onset of parthenogenesis.
Polyploidization is highly associated with parthenogenetic reproduction in animals and is also frequently correlated with hybridization (Otto & Whitton, 2000). It is conceivable that polyploidization events were accompanied by transitions to parthenogenesis in M.
Moreover, since multiple origins of polyploidy are not uncommon (reviewed in Chaplin & Hebert, 1997;Dufresne & Hebert, 1994;Otto & Whitton, 2000;Soltis et al., 1993;Soltis & Soltis, 1999), the putative joint occurrence of parthenogenesis and polyploidy would neither contradict scenarios involving a single or multiple origins of parthenogenesis. To actually assess these relationships, however, additional work is required.

| CON CLUS IONS
In this study, we assess genetic patterns for different variant types in a parthenogenetic, triploid, and endemic Alpine bristletail species.
We demonstrate that mitochondrial and common nuclear variants mirror geographic patterns. Moreover, we highlight that different types of variants capture different aspects of the evolutionary history of the species and outline the potential of their combined consideration for unraveling more complex scenarios. We emphasize that M. pallida and the genus Machilis, in general, represent an interesting study system for the evolution of different sexual strategies, polyploidization, and genome re-organization, as well as for adaptation to environmental change. We also highlight the need for further studies and the presented novel resources, transcriptome, and mitochondrial assemblies together with transcriptome data for 17 individuals, will facilitate future research in this diverse system.

CO N FLI C T O F I NTER E S T S TATEM ENT
None.

DATA AVA I L A B I L I T Y S TAT E M E N T
DNA sequences: European Nucleotide Archive (ENA) study accession ERP120116 Individual accessions and sample information: TA B L E A 5 Pairwise distance matrix between individuals using Nei's distance. Genetic distances using nuclear not fixed variants (below the diagonal) and mitochondrial variants (above diagonal).

TA B L E A 6
Pairwise distance matrix between populations using Nei's distance. Genetic distances using nuclear not fixed variants (below the diagonal) and mitochondrial variants (above diagonal).